-> Decomposed time series in an effort to replicate Serena’s approach.
-> Fit linear regression to decomposed trend in attempt to replicate Serena’s approach. Interesting because it was similar, but not the same.
##
## Call:
## lm(formula = Underlying.Windstorms ~ Month, data = Underlying.Wind)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.29786 -0.35209 -0.02924 0.29417 1.67280
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.9958362 0.0729440 68.489 < 2e-16 ***
## Month -0.0019724 0.0005248 -3.759 0.000215 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5633 on 238 degrees of freedom
## Multiple R-squared: 0.05603, Adjusted R-squared: 0.05206
## F-statistic: 14.13 on 1 and 238 DF, p-value: 0.0002151
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
## `geom_smooth()` using formula 'y ~ x'
-> The data not randomly distributed.
-> When a linear regression model is suitable for a data set, then
the residuals are more or less randomly distributed around the 0
line.
-> Linear regression might not be suitable.
-> Data has increasing non-constant variance.
-> Transformation needed. (log, sqrt, etc.)
-> No outliers observed.
-> Basic linear regression using Month number as predictor.
##
## Call:
## lm(formula = Windstorms ~ Month, data = Wind)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.751 -1.571 -0.075 1.380 6.169
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.232846 0.282569 18.519 <2e-16 ***
## Month -0.003190 0.001936 -1.647 0.101
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.236 on 250 degrees of freedom
## Multiple R-squared: 0.01074, Adjusted R-squared: 0.006779
## F-statistic: 2.713 on 1 and 250 DF, p-value: 0.1008
-> Same comments as above.
-> Using lags and step wise reduced model.
##
## Call:
## lm(formula = y ~ ., data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.802 -1.290 -0.007 1.011 4.787
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.303251 1.806995 2.935 0.00377 **
## Lag1 0.103447 0.071489 1.447 0.14963
## Lag2 0.199806 0.071967 2.776 0.00608 **
## Lag3 -0.008333 0.073545 -0.113 0.90991
## Lag4 -0.025409 0.074331 -0.342 0.73288
## Lag5 0.069941 0.074600 0.938 0.34974
## Lag6 -0.024152 0.074674 -0.323 0.74675
## Lag7 -0.069821 0.073514 -0.950 0.34351
## Lag8 -0.117487 0.073764 -1.593 0.11298
## Lag9 0.033978 0.074336 0.457 0.64816
## Lag10 0.096567 0.073381 1.316 0.18987
## Lag11 -0.030853 0.073493 -0.420 0.67513
## Lag12 -0.199938 0.072247 -2.767 0.00624 **
## Lag13 -0.052258 0.071358 -0.732 0.46493
## Lag14 -0.003627 0.071620 -0.051 0.95967
## Lag15 0.127517 0.072078 1.769 0.07857 .
## Lag16 -0.015484 0.071380 -0.217 0.82851
## Lag17 -0.027724 0.070675 -0.392 0.69532
## Lag18 0.021696 0.069917 0.310 0.75669
## Lag19 -0.126085 0.070002 -1.801 0.07336 .
## Lag20 0.031782 0.070812 0.449 0.65410
## Lag21 -0.005362 0.070998 -0.076 0.93988
## Lag22 -0.076261 0.070911 -1.075 0.28362
## Lag23 0.007255 0.070148 0.103 0.91774
## Lag24 -0.128366 0.069457 -1.848 0.06623 .
## Lag25 0.093584 0.069472 1.347 0.17966
## Lag26 0.068664 0.069930 0.982 0.32748
## Lag27 0.092229 0.069906 1.319 0.18874
## Lag28 0.123940 0.070584 1.756 0.08081 .
## Lag29 -0.042756 0.069867 -0.612 0.54134
## Lag30 -0.146594 0.070340 -2.084 0.03857 *
## Lag31 -0.185397 0.070938 -2.614 0.00972 **
## Lag32 0.017330 0.071628 0.242 0.80910
## Lag33 0.001821 0.071590 0.025 0.97974
## Lag34 -0.104696 0.071614 -1.462 0.14551
## Lag35 0.167938 0.071539 2.347 0.01999 *
## Lag36 0.005321 0.071072 0.075 0.94040
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.874 on 179 degrees of freedom
## Multiple R-squared: 0.4215, Adjusted R-squared: 0.3051
## F-statistic: 3.623 on 36 and 179 DF, p-value: 6.343e-09
## [1] 1.705796
##
## Call:
## lm(formula = y ~ Lag1 + Lag2 + Lag8 + Lag10 + Lag12 + Lag15 +
## Lag19 + Lag24 + Lag25 + Lag28 + Lag30 + Lag31 + Lag34 + Lag35,
## data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1008 -1.2866 0.0711 1.0925 4.8070
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.05235 1.09800 4.601 7.42e-06 ***
## Lag1 0.10826 0.06235 1.736 0.084046 .
## Lag2 0.23490 0.06141 3.825 0.000174 ***
## Lag8 -0.11370 0.06304 -1.803 0.072810 .
## Lag10 0.09033 0.06345 1.424 0.156080
## Lag12 -0.21746 0.06331 -3.435 0.000720 ***
## Lag15 0.16968 0.06262 2.710 0.007318 **
## Lag19 -0.17127 0.06098 -2.809 0.005465 **
## Lag24 -0.12018 0.06055 -1.985 0.048528 *
## Lag25 0.09612 0.06013 1.599 0.111482
## Lag28 0.10275 0.05923 1.735 0.084326 .
## Lag30 -0.11305 0.05876 -1.924 0.055791 .
## Lag31 -0.16941 0.06100 -2.777 0.006003 **
## Lag34 -0.11002 0.06118 -1.798 0.073631 .
## Lag35 0.13733 0.06197 2.216 0.027809 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.807 on 201 degrees of freedom
## Multiple R-squared: 0.3956, Adjusted R-squared: 0.3535
## F-statistic: 9.397 on 14 and 201 DF, p-value: 7.787e-16
## [1] 1.74355
-> Improved randomness in data distribution.
-> Variance is improved and seems relatively constant.
-> Linearity is preserved.
-> No outliers observed.
-> Overall trend preserved.
##
## Call:
## lm(formula = y ~ ., data = df.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2057 -1.1163 -0.0251 1.0474 4.3263
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.1652666 1.8937659 3.256 0.00143 **
## Lag1 0.1651594 0.0801662 2.060 0.04128 *
## Lag2 0.1712802 0.0798110 2.146 0.03364 *
## Lag3 0.0054553 0.0801964 0.068 0.94587
## Lag4 0.0197786 0.0796852 0.248 0.80435
## Lag5 0.0440437 0.0792582 0.556 0.57933
## Lag6 0.0156823 0.0790371 0.198 0.84302
## Lag7 -0.0864357 0.0774582 -1.116 0.26643
## Lag8 -0.1864329 0.0772313 -2.414 0.01711 *
## Lag9 0.0175569 0.0782057 0.224 0.82271
## Lag10 0.1493430 0.0777118 1.922 0.05673 .
## Lag11 -0.0077403 0.0781662 -0.099 0.92127
## Lag12 -0.1923934 0.0766485 -2.510 0.01324 *
## Lag13 -0.0955724 0.0756511 -1.263 0.20863
## Lag14 -0.0784204 0.0753650 -1.041 0.29993
## Lag15 0.1617555 0.0750741 2.155 0.03295 *
## Lag16 -0.0557097 0.0747308 -0.745 0.45727
## Lag17 -0.0571948 0.0737398 -0.776 0.43931
## Lag18 0.0444990 0.0733280 0.607 0.54496
## Lag19 -0.0944010 0.0729117 -1.295 0.19761
## Lag20 -0.0004723 0.0737548 -0.006 0.99490
## Lag21 -0.0552491 0.0740898 -0.746 0.45713
## Lag22 -0.0752446 0.0739539 -1.017 0.31074
## Lag23 0.0382961 0.0736523 0.520 0.60394
## Lag24 -0.1153656 0.0725775 -1.590 0.11426
## Lag25 0.0787099 0.0723410 1.088 0.27850
## Lag26 0.1006884 0.0731419 1.377 0.17089
## Lag27 0.0127424 0.0732255 0.174 0.86211
## Lag28 0.1631044 0.0737925 2.210 0.02876 *
## Lag29 -0.0279901 0.0730933 -0.383 0.70236
## Lag30 -0.1217811 0.0739390 -1.647 0.10186
## Lag31 -0.1904145 0.0747048 -2.549 0.01192 *
## Lag32 0.0125182 0.0748956 0.167 0.86751
## Lag33 -0.0050747 0.0753990 -0.067 0.94644
## Lag34 -0.1277886 0.0770638 -1.658 0.09958 .
## Lag35 0.1403681 0.0770323 1.822 0.07062 .
## Lag36 -0.0676444 0.0765462 -0.884 0.37841
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.813 on 136 degrees of freedom
## Multiple R-squared: 0.4811, Adjusted R-squared: 0.3438
## F-statistic: 3.503 on 36 and 136 DF, p-value: 7.205e-08
## [1] 2.261222
##
## Call:
## lm(formula = y ~ Lag1 + Lag2 + Lag8 + Lag10 + Lag12 + Lag15 +
## Lag19 + Lag24 + Lag26 + Lag28 + Lag30 + Lag31 + Lag34 + Lag35,
## data = df.train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9073 -1.2893 0.0145 1.0613 4.3768
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.61850 1.17526 3.930 0.000127 ***
## Lag1 0.14579 0.06943 2.100 0.037323 *
## Lag2 0.19075 0.06683 2.854 0.004892 **
## Lag8 -0.16470 0.06607 -2.493 0.013704 *
## Lag10 0.12824 0.06743 1.902 0.059001 .
## Lag12 -0.20981 0.06668 -3.147 0.001975 **
## Lag15 0.17186 0.06526 2.633 0.009296 **
## Lag19 -0.10863 0.06392 -1.700 0.091186 .
## Lag24 -0.12989 0.06293 -2.064 0.040638 *
## Lag26 0.09544 0.06356 1.502 0.135176
## Lag28 0.16378 0.06129 2.672 0.008329 **
## Lag30 -0.09574 0.06128 -1.562 0.120222
## Lag31 -0.18578 0.06424 -2.892 0.004367 **
## Lag34 -0.13432 0.06504 -2.065 0.040558 *
## Lag35 0.15682 0.06587 2.381 0.018469 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.736 on 158 degrees of freedom
## Multiple R-squared: 0.4473, Adjusted R-squared: 0.3983
## F-statistic: 9.133 on 14 and 158 DF, p-value: 1.749e-14
## [1] 2.145673
-> Same as previous comments.
##
## Call:
## lm(formula = Windstorms ~ Month, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6885 -1.6722 -0.0993 1.3928 6.2098
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.302632 0.308831 17.170 <2e-16 ***
## Month -0.004067 0.002550 -1.595 0.112
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.224 on 207 degrees of freedom
## Multiple R-squared: 0.01214, Adjusted R-squared: 0.007365
## F-statistic: 2.543 on 1 and 207 DF, p-value: 0.1123
## [1] 2.299485
-> Further improved randomness in training data
distribution.
-> Variance is further improved and seems relatively constant.
-> Linearity is still preserved.
-> No outliers observed in training data.
##
## Call:
## glm(formula = y ~ ., family = "poisson", data = df.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.96305 -0.61141 0.02118 0.46553 2.01335
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.8243562 0.4859019 3.755 0.000174 ***
## Lag1 0.0319557 0.0201206 1.588 0.112239
## Lag2 0.0343060 0.0204484 1.678 0.093408 .
## Lag3 -0.0006835 0.0206323 -0.033 0.973572
## Lag4 0.0018537 0.0195971 0.095 0.924641
## Lag5 0.0079283 0.0198476 0.399 0.689554
## Lag6 0.0044299 0.0202349 0.219 0.826711
## Lag7 -0.0129419 0.0199265 -0.649 0.516025
## Lag8 -0.0403506 0.0202402 -1.994 0.046197 *
## Lag9 0.0011141 0.0197335 0.056 0.954976
## Lag10 0.0314161 0.0203940 1.540 0.123448
## Lag11 0.0028507 0.0200140 0.142 0.886737
## Lag12 -0.0441263 0.0195849 -2.253 0.024254 *
## Lag13 -0.0183676 0.0196366 -0.935 0.349593
## Lag14 -0.0136586 0.0194065 -0.704 0.481546
## Lag15 0.0320105 0.0186735 1.714 0.086489 .
## Lag16 -0.0085467 0.0187153 -0.457 0.647910
## Lag17 -0.0100425 0.0184779 -0.543 0.586794
## Lag18 0.0066532 0.0183591 0.362 0.717059
## Lag19 -0.0224879 0.0186172 -1.208 0.227082
## Lag20 0.0016922 0.0191991 0.088 0.929764
## Lag21 -0.0080811 0.0190933 -0.423 0.672120
## Lag22 -0.0148979 0.0191178 -0.779 0.435821
## Lag23 0.0077668 0.0185850 0.418 0.676015
## Lag24 -0.0334716 0.0191717 -1.746 0.080830 .
## Lag25 0.0207022 0.0185789 1.114 0.265157
## Lag26 0.0218580 0.0188488 1.160 0.246190
## Lag27 0.0062525 0.0184735 0.338 0.735019
## Lag28 0.0354825 0.0186218 1.905 0.056724 .
## Lag29 -0.0053556 0.0188092 -0.285 0.775848
## Lag30 -0.0250875 0.0194278 -1.291 0.196594
## Lag31 -0.0458637 0.0196851 -2.330 0.019813 *
## Lag32 -0.0034065 0.0194956 -0.175 0.861293
## Lag33 -0.0018370 0.0195285 -0.094 0.925057
## Lag34 -0.0244465 0.0204344 -1.196 0.231564
## Lag35 0.0289662 0.0198620 1.458 0.144738
## Lag36 -0.0116835 0.0201027 -0.581 0.561111
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 192.19 on 172 degrees of freedom
## Residual deviance: 103.92 on 136 degrees of freedom
## AIC: 749.12
##
## Number of Fisher Scoring iterations: 4
## [1] 2.191144
##
## Call:
## glm(formula = y ~ Lag1 + Lag2 + Lag8 + Lag10 + Lag12 + Lag15 +
## Lag19 + Lag24 + Lag28 + Lag31 + Lag34 + Lag35, family = "poisson",
## data = df.train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.05023 -0.60999 0.03858 0.47768 1.95468
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.50575 0.29010 5.190 2.1e-07 ***
## Lag1 0.03945 0.01749 2.255 0.0241 *
## Lag2 0.04375 0.01732 2.526 0.0115 *
## Lag8 -0.03673 0.01771 -2.074 0.0380 *
## Lag10 0.03114 0.01788 1.742 0.0815 .
## Lag12 -0.04858 0.01765 -2.753 0.0059 **
## Lag15 0.03967 0.01683 2.357 0.0184 *
## Lag19 -0.03177 0.01667 -1.906 0.0567 .
## Lag24 -0.02982 0.01665 -1.791 0.0732 .
## Lag28 0.03511 0.01608 2.183 0.0290 *
## Lag31 -0.04409 0.01748 -2.522 0.0117 *
## Lag34 -0.02991 0.01725 -1.734 0.0829 .
## Lag35 0.03139 0.01705 1.841 0.0657 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 192.19 on 172 degrees of freedom
## Residual deviance: 112.13 on 160 degrees of freedom
## AIC: 709.33
##
## Number of Fisher Scoring iterations: 4
## [1] 2.139466
-> No real observable improvement to previous progress.
-> lag12 might have significance with cyclical trend after every 12
counts of lag (i.e. Lag25, Lag49, etc.)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Series: trend
## ARIMA(0,1,0)(1,0,1)[12]
##
## Coefficients:
## sar1 sma1
## -0.0533 -0.8644
## s.e. 0.0805 0.0594
##
## sigma^2 = 0.01455: log likelihood = 158.57
## AIC=-311.13 AICc=-311.03 BIC=-300.7
-> The output shows that the best fitting ARIMA model for the
trend component of the time series has orders (0,1,0) for the
non-seasonal part, and (1,0,1) for the seasonal part with a frequency of
12 (denoted as [12]).
-> The log likelihood of the model is 158.57, and the Akaike
Information Criterion (AIC), corrected AIC (AICc), and Bayesian
Information Criterion (BIC) values are -311.13, -311.03, and -300.7,
respectively.
-> The ARIMA model suggests that the trend component of the time
series is best represented as a random walk with drift (i.e.,
ARIMA(0,1,0)), and a seasonal moving average with order one (i.e.,
SARIMA(0,1,0)(1,0,1)[12]).
-> One potential takeaway from this analysis is that there is a
seasonal pattern in the trend of the number of windstorms per
month.
-> The non-seasonal part of the ARIMA model has an order of (0,1,0),
which means that the first difference of the series (i.e., the
difference between consecutive observations) is a random walk with
drift. This suggests that the trend component of the time series is not
stationary, but rather exhibits a long-term upward or downward trend.
-> the seasonal part of the ARIMA model has an order of (1,0,1)[12],
which indicates the presence of seasonality with a period of 12 months.
Specifically, the model suggests that the trend component of the series
has a moving average component that is influenced by the value of the
series at the same time point in the previous year, as well as by a
random shock.
-> the relatively small value of the estimated sigma-squared
parameter (0.01455) indicates that the variance of the trend component
of the series is relatively low, which could suggest that the series is
relatively stable and predictable. However, caution should be exercised
when interpreting this finding, as the ARIMA model assumes that the
underlying data generating process is stationary, which may not be the
case for the original time series.
##
## Call:
## lm(formula = windTS$trend ~ index(windTS$trend))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.29786 -0.35209 -0.02924 0.29417 1.67280
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52.367429 12.667130 4.134 4.94e-05 ***
## index(windTS$trend) -0.023669 0.006297 -3.759 0.000215 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5633 on 238 degrees of freedom
## (12 observations deleted due to missingness)
## Multiple R-squared: 0.05603, Adjusted R-squared: 0.05206
## F-statistic: 14.13 on 1 and 238 DF, p-value: 0.0002151
-> The decomposition of the time series into its components showed
that there is a seasonal pattern in the data, with the number of storms
peaking in the winter months and decreasing in the summer months.
-> The trend component of the time series showed a slight decrease in
the number of storms over time, although the relationship with time is
not very strong.
-> The auto.arima function suggested an ARIMA(0,1,0)(1,0,1)[12] model
for the trend component of the time series, which includes a seasonal
component and a moving average component.
-> The diagnostic plots for the ARIMA model showed that the residuals
were approximately normally distributed and had constant variance over
time, suggesting that the model is a good fit for the data.
-> Based on the forecast from the ARIMA model, it is predicted that
the number of storms will continue to decrease slightly over the next 12
months.
-> The main takeaway from this analysis is that while there is
evidence of a slight downward trend in the number of windstorms over
time, the relationship is not very strong and there is still
considerable variability in the data. Therefore, it is important to
continue monitoring the data and update the analysis as more data
becomes available.